Skip to content

Conversation

@wbpcode
Copy link
Member

@wbpcode wbpcode commented Oct 18, 2025

Commit Message: http: new body transform filter with the substitution formatter
Additional Description:

To close #35783. There is enough context in the #35783. In short sentence, this PR add a transform filter to modify request or response headers by the substitution formatter API.

It's also possible to add body attributes to headers and refresh the routes.

Risk Level: low. New HTTP filter.
Testing: unit/integration.
Docs Changes: added.
Release Notes: added.
Platform Specific Features: n/a.

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #41612 was opened by wbpcode.

see: more, trace.

@wbpcode
Copy link
Member Author

wbpcode commented Oct 18, 2025

cc @arkodg cc @agrawroh for you guys may care about this. cc @botengyao for this may helpful for AI traffic routing.

Signed-off-by: WangBaiping <[email protected]>
Signed-off-by: WangBaiping <[email protected]>
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Left a few API comments

// to the existing body. If not set, the result of ``body_format_string`` will replace
// the existing body.
// True by default.
google.protobuf.BoolValue patch_format_string = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is highly coupled with body_format_string and that is optional, would it make sense to move both to an internal message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// And except the commonly used format specifiers, there are some additional format specifiers
// provided by the transform filter:
//
// * ``%RQ_BODY(X)%``: the request body. And ``X`` is the JSON pointer to extract a specific
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require a JSON-body? I may have missed it, but this seems to be a generic transform, and doesn't require JSON-body to be sent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON body is required. We have documented it that only JSON body is supported for now.

// * ``%RQ_BODY(X)%``: the request body. And ``X`` is the JSON pointer to extract a specific
// field from the JSON body. If ``X`` is empty, the whole body will be used.
// * ``%RS_BODY(X)%``: the response body. And ``X`` is the JSON pointer to extract a specific
// field from the JSON body. If ``X`` is empty, the whole body will be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole body will be used

FWIW this might cause the headermap to go over the limit, and may be an attack vector, but as this filter doesn't have a security posture, it might be ok.

Also it would be good to clarify whether using this filter implies that all requests/responses are buffered, even if headers_mutations are defined without referencing RQ/RS_BODY. If there's a need to support both buffering and streaming modes, consider adding an enum field specifying the mode (and maybe only support one for now).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At next few weeks, I will try to support the event stream for the response. We will handle it as stream for that. But for JSON request/response, buffering always be used.


// Body transformation configuration. If not set, no body transformation will be performed.
// NOTE the ``RQ_BODY`` and ``RS_BODY`` format specifiers can also be used here.
config.core.v3.SubstitutionFormatString body_format_string = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on using JSONPatch semantics here ?
https://jsonpatch.com
its used in Envoy Gateway API to edit xDS https://gateway.envoyproxy.io/docs/tasks/extensibility/envoy-patch-policy/#customize-response

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. But it's will have big effect to the current implementation. I will update the API to keep the patch mode as enum and then we can support it in the future.

Copy link
Member

@botengyao botengyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding this filter! have one question out of curiosity.

Comment on lines +178 to +182
if (end_stream) {
decoder_callbacks_->addDecodedData(data, true);
handleCompleteRequestBody();
return Http::FilterDataStatus::Continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related but not tied / blocker to this PR, do you have some insights for the streaming JSON parser support without waiting for the whole body once we get the enough data. I think the current nlohmann lib doesn't quite support it, and we could need some SAX style approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rapidJson could help? But we removed that because suggestions from google. 🤣

@wbpcode
Copy link
Member Author

wbpcode commented Oct 28, 2025

friendly ping for a new review cc @adisuissa

@wbpcode
Copy link
Member Author

wbpcode commented Oct 29, 2025

/retest

// Only JSON body transformation is supported for now.
message TransformConfig {
// Configuration for transforming request. If set then the request will be buffered
// until the entire body is received and transformed for JSON request bodies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this needs to also address HTTP trailers case?
(I'm not saying to add it now, but there may need to be a config knob in the future that decides what needs to be buffered before sending it down/up the stream. I believe this may be in the "Transformation" message).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM and make sense. Let's the future requirement/issue to tell us!


// Configuration for the transform filter. The request and response transformations are
// independent and could be configured separately.
// Only JSON body transformation is supported for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there should just be a JsonTransform filter here, instead of something generic. The reason I bring this up is because the transformations seem to be highly coupled with JSON bodies. (this is a discussion and not a a call for action).
WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually depends the feedback from the users and community, I personally hope this filter could handle various content type like plain text or url form by extend other fields in this filter.
JSON is supported first is because that the JSON is used widely for now in open API and AI related things. In addition, for the response, we need to support event stream to support AI traffic.

// lookup key in the namespace with the option of specifying nested keys separated by ':'.
// * ``%RESPONSE_BODY(KEY*)%``: the response body. And ``Key`` KEY is an optional
// lookup key in the namespace with the option of specifying nested keys separated by ':'.
repeated config.common.mutation_rules.v3.HeaderMutation headers_mutations = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the headers be blocked until the body is buffered even if headers_mutations isn't configured?

Copy link
Member Author

@wbpcode wbpcode Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, yes for simplification of the code. But this is good point that could be optimized.


Hmmm, it will require the decodeData to handle the continue or stop of decodeHeaders correctly. Seems it's not deserved to optimize because the additional complexity 🤔 The filter will not be performant anyway because the body processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is an implementation issue, but I think it should be spelled clearly in a note above (example: buffering of the entire request headers and body will always happen on a request even if only headers are transformed).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.


// Body transformation configuration. If not set, no body transformation will be performed.
// NOTE the ``REQUEST_BODY`` and ``RESPONSE_BODY`` format specifiers can also be used here.
config.core.v3.SubstitutionFormatString body_format = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here. This field type is SubstitutionFormatString which by the name suggests that there will be a substitution. However, the action can either be MERGE or REPLACE. Is the MERGE essentially a substitute? If so, what is REPLACE in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Substitution in SubstitutionFormatString is means the substitution formatter will be used in the format string. For example, in the final generated output, %REQ(xxx)% in the format string will be substituted by the actual header value.

And the action is used to indicate how to handle the final generated output of the body_format and the original body content.

I will add some more comment to make it more clear.

repeated config.common.mutation_rules.v3.HeaderMutation headers_mutations = 1;

// The body transformation configuration. If not set, no body transformation will be performed.
BodyTransformation body_transformation = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see a case for multiple body transformations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This is a one time logic. We will generate a new body based on the body_format and then the new body will replace or be merge into original body content.
If multiple phase transformations is necessary, the users could configure multiple transform filters in the filter chain anyway. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the same can be said about headers, but I agree that headers are more likely to have multiple transformations (one for each header key).

@yanavlasov yanavlasov self-assigned this Oct 29, 2025
@wbpcode
Copy link
Member Author

wbpcode commented Oct 30, 2025

/retest

Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall API LGTM.
Left a suggestion to clarify the implication of using this filter (buffering) in a note (for Envoy at least).

repeated config.common.mutation_rules.v3.HeaderMutation headers_mutations = 1;

// The body transformation configuration. If not set, no body transformation will be performed.
BodyTransformation body_transformation = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the same can be said about headers, but I agree that headers are more likely to have multiple transformations (one for each header key).

// lookup key in the namespace with the option of specifying nested keys separated by ':'.
// * ``%RESPONSE_BODY(KEY*)%``: the response body. And ``Key`` KEY is an optional
// lookup key in the namespace with the option of specifying nested keys separated by ':'.
repeated config.common.mutation_rules.v3.HeaderMutation headers_mutations = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is an implementation issue, but I think it should be spelled clearly in a note above (example: buffering of the entire request headers and body will always happen on a request even if only headers are transformed).

Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@repokitteh-read-only repokitteh-read-only bot removed the api label Nov 3, 2025
@wbpcode
Copy link
Member Author

wbpcode commented Nov 3, 2025

/retest

@wbpcode
Copy link
Member Author

wbpcode commented Nov 7, 2025

We added a new type alias in this PR's previous commit. But another alias for same type was added in the main by recently merged PR. So, after merged the main, at the latest commit, we removed the unnecessary new alias and reuse the another alias.

@wbpcode
Copy link
Member Author

wbpcode commented Nov 7, 2025

gently ping for a review @yanavlasov

@agrawroh
Copy link
Member

agrawroh commented Nov 7, 2025

/retest

@wbpcode wbpcode enabled auto-merge (squash) November 8, 2025 00:06
@wbpcode wbpcode merged commit 4f03748 into envoyproxy:main Nov 8, 2025
26 checks passed
@wbpcode wbpcode deleted the dev-body-transform branch November 8, 2025 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Body Transformations

6 participants